Monolingual Document Retrieval: English versus other European Languages

نویسندگان

  • Jaap Kamps
  • Christof Monz
  • Maarten de Rijke
  • Börkur Sigurbjörnsson
چکیده

The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding; of language-independent approaches, such as character n-gramming; and of the combination of the two types of approaches. The experimental evidence is obtained using the 2003 test-suite of the cross-language evaluation forum (CLEF).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval

The Dublin City University group participated in the monolingual, bilingual and multilingual retrieval tasks this year. The main focus of our investigation this year was extending our retrieval system to document languages other than English, and completing the multilingual task comprising four languages: English, French, Russian and Finnish. Results from our French monolingual experiments indi...

متن کامل

Cross-Language Spoken Document Retrieval on the TREC SDR Collection

This paper presents preliminary experiments on crosslanguage spoken document retrieval (SDR) carried out on a benchmark assembled at ITC-irst. The benchmark is based on resources used in the last two spoken document retrieval tracks at the TREC conference, which are available on the Internet. They include automatic transcripts of American English broadcast news, short topics written in English,...

متن کامل

Cross-Lingual Information Retrieval System for Indian Languages

This paper describes our first participation in the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF competition. In this track, the task is to retrieve relevant documents from an English corpus in response to a query expressed in different Indian languages including Hindi, Tamil, Telugu, Bengali and Marathi. Groups participating in this track are required to s...

متن کامل

Passage Retrieval vs. Document Retrieval in the Monolingual Task with the IR-n System

The paper describes our participation in monolingual tasks at CLEF 2006. We have submitted results for the following languages: English, French, Portuguese and Hungarian. We focused on studying different weighting schemes (okapi and dfr) and retrieval strategies (passage retrieval and document retrieval) to improve retrieval performance. After an analysis of our experiments and of the official ...

متن کامل

Monolingual and Bilingual Experiments in GeoCLEF2006

This paper presents the results of our initial experiments in the monolingual English, Spanish and Portuguese tasks and the Bilingual Spanish → English, Spanish → Portuguese, English → Spanish and Portuguese → Spanish tasks. Twenty runs were submitted as official runs, thirteen for the monolingual task and seven for the bilingual task. We used the Terrier Information Retrieval Platform to run e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003